Supplementary Material: Asynchronous Stochastic Gradient Descent with Delay Compensation

ثبت نشده
چکیده

where Cij = 1 1+λ ( uiujβ lilj √ α ), C ′ ij = 1 (1+λ)α(lilj) , and the model converges to the optimal model, then the MSE of λG(wt) is smaller than the MSE of G(wt) in approximating Hessian H(wt). Proof: For simplicity, we abbreviate E(Y |x,w∗) as E, Gt as G(wt) and Ht as H(wt). First, we calculate the MSE of Gt, λGt to approximate Ht for each element of Gt. We denote the element in the i-th row and j-th column of G(wt) as Gij and H(wt) as Hij(t). The MSE of Gij : E(Gij − EH ij) = E(Gij − EGij) + (EH ij − EGij) = E(Gij) − (EGij) + εt (2)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asynchronous Stochastic Gradient Descent with Delay Compensation

With the fast development of deep learning, people have started to train very big neural networks using massive data. Asynchronous Stochastic Gradient Descent (ASGD) is widely used to fulfill this task, which, however, is known to suffer from the problem of delayed gradient. That is, when a local worker adds the gradient it calculates to the global model, the global model may have been updated ...

متن کامل

Accelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation

Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we propose the “momentum compensation” technique to accelerate asynchronous algorithms for convex problems. Specifically, we first accelerate the plain Asynchronous ...

متن کامل

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an ...

متن کامل

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from...

متن کامل

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017